Products and methods for analyzing nucleic acids including identification of substitutions, insertions and deletions

ABSTRACT

Systems and methods for detecting monomer changes in a sample when an unknown quantity of expected monomers may also be present. Homogeneous and heterogeneous samples are exposed to polymer probes for hybridization. The hybridization affinities of the polymer probes to the samples are then compared to determine differences between the polymers in the samples. Accordingly, deletion, substitution and insertion mutations may be detected in a heterogeneous sample of nucleic acids.

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/087,567, filed Jun. 1, 1998, which is herebyincorporated by reference.

GOVERNMENT RIGHTS NOTICE

[0002] Portions of the material in this specification arose under thecooperative agreement 70NANB5H1031 between Affymetrix, Inc. and theDepartment of Commerce through the National Institute of Standards andTechnology.

BACKGROUND OF THE INVENTION

[0003] The present invention is related to computer systems foranalyzing polymers. More particularly, the invention provides systemsand methods for analyzing biopolymers, such as nucleic acids, in orderto identify monomer substitutions, insertions and deletions.

[0004] U.S. Pat. No. 5,424,186, which is hereby incorporated byreference for all purposes, describes pioneering techniques for, amongother things, forming and using high density arrays of molecules such asoligonucleotides, peptides, polysaccharides, and other materials. Arraysof oligonucleotides, for example, are formed on the surface bysequentially removing a photoremovable group from a surface, coupling amonomer to the exposed region of the surface, and repeating the process.These techniques have been used to form extremely dense arrays ofoligonucleotides, peptides, and other materials. Such arrays are usefulin, for example, drug development, oligonucleotide sequencing,oligonucleotide sequence checking, and a variety of other applications.The synthesis technology associated with this invention has come to beknown as “VLSIPS” or “Very Large Scale Immobilized Polymer Synthesis”technology.

[0005] Additional techniques for forming and using such arrays aredescribed in U.S. Pat. No. 5,384,261, which is also incorporated byreference for all purposes. Such techniques include systems formechanically protecting portions of a substrate (or chip), andselectively deprotecting/coupling materials to the substrate. Thesetechniques are now known as “VLSIPS II.” Still further techniques forarray synthesis are provided in U.S. application Ser. No. 08/327,512,also incorporated herein by reference for all purposes.

[0006] Dense arrays fabricated according to these techniques are used,for example, to screen the array of probes to determine which probe(s)are complementary to a target of interest. According to one specificaspect of the inventions described above, the array is exposed to alabeled target. The target may be labeled with a wide variety ofmaterials, but an exemplary label is a fluorescein label. The array isthen scanned with a confocal microscope based detection system, or otherrelated system, to identify where the target has bound to the array.Other labels include, but are not limited to, radioactive labels, largemolecule labels, and others.

[0007] Innovative computer-aided techniques for identifying monomers insample polymers are disclosed in U.S. patent application Ser. No.08/531,137 (attorney docket no. 16528X008210), No. 08/528,656 (attorneydocket no. 16528X-017600), and No. 08/618,834 (attorney docket no.16528X-016400), which are all hereby incorporated by reference for allpurposes. However, improved systems and methods are still needed toevaluate, analyze, and process the vast amount of information now usedand made available by these pioneering technologies.

[0008] One area that can be more thoroughly explored is identifyingchanges in a heterogeneous sample of polymers. For example, biopsiesfrom cancerous areas or tumors of a patient's body will typicallyinclude genetic material from both normal cells and cancerous cells. Inorder to better diagnose a cancerous area, it would be beneficial to beable to identify mutations in the p53 genes of a heterogeneous sample,especially where an unknown quantity of wild-type p53 genes are present.

SUMMARY OF THE INVENTION

[0009] The present invention provides techniques for detecting monomerchanges in a heterogeneous sample when an unknown quantity of expected(e.g., wild-type) monomers may also be present. Heterogeneous andhomogenous samples are exposed to polymer probes for hybridization,where the homogeneous sample acts as a reference. The hybridizationaffinities of the polymer probes to the heterogeneous and homogeneoussamples are then compared to determine differences between the polymersin the samples. For example, embodiments of the invention allow for thedetection of deletion, substitution and insertion mutations in aheterogeneous samples of nucleic acids. Several embodiments of theinvention are as follows.

[0010] In one embodiment of the invention, a method of analyzing aheterogeneous sample of nucleic acids is provided. Hybridizationaffinities of a homogeneous sample of nucleic acids to a plurality ofnucleic acid probes are received. Hybridization affinities of theheterogeneous sample of nucleic acids to the plurality of nucleic acidprobes are also received. The hybridization affinities of thehomogeneous and heterogeneous samples are then compared to identify amutation in the heterogeneous sample. In a preferred embodiment, a firstratio of a hybridization affinity of a non-wild-type probe to ahybridization affinity of a wild-type probe for the homogeneous sampleof nucleic acids is calculated and a second ratio of a hybridizationaffinity of a non-wild-type probe to a hybridization affinity of awild-type probe for the heterogeneous sample of nucleic acids iscalculated. A mutation is identified in the heterogeneous sample if thefirst ratio is less than the second ratio.

[0011] In another embodiment of the invention, a method of analyzing aheterogeneous sample of nucleic acids is provided. Hybridizationaffinities of a homogeneous sample of nucleic acids to a plurality ofnucleic acid probes are received. The plurality of nucleic acid probesinclude a wild-type probe and at least one non-wild-type probe.Hybridization affinities of a heterogeneous sample of nucleic acids tothe plurality of nucleic acid probes are also received. A first ratio ofa hybridization affinity of a wild-type probe to a hybridizationaffinity of a non-wild-type probe for the homogeneous sample of nucleicacids is calculated. A second ratio of a hybridization affinity of awild-type probe to a hybridization affinity of a non-wild-type probe forthe heterogeneous sample of nucleic acids is calculated. A third ratioof the difference between the first and second ratios to the first ratiois then calculated. It is determined that there is a mutation in theheterogeneous sample if the third ratio is above a predeterminedthreshold, the mutation being identified by the non-wild-type probe.

[0012] A further understanding of the nature and advantages of theinventions herein may be realized by reference to the remaining portionsof the specification and the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013]FIG. 1 illustrates an example of a computer system that may beused to execute software embodiments of the present invention;

[0014]FIG. 2 shows a system block diagram of a typical computer system;

[0015]FIG. 3 illustrates an overall system for forming and analyzingarrays of biological materials such as DNA or RNA;

[0016]FIG. 4 is an illustration of an embodiment of software for theoverall system;

[0017]FIG. 5 illustrates the global layout of a chip formed in theoverall system;

[0018]FIG. 6 illustrates conceptually the binding of nucleic acid probeson chips to a labeled target;

[0019]FIG. 7 illustrates nucleic acid probes arranged in lanes on achip;

[0020]FIG. 8 illustrates a hybridization pattern of a target on a chipwith a reference sequence as in FIG. 7;

[0021]FIG. 9 illustrates standard and standard variant tilings;

[0022]FIG. 10 shows a bar graph including hybridization affinity of ahomogeneous sample and a heterogeneous sample;

[0023]FIG. 11 shows a flowchart of a process that analyzes hybridizationaffinities for homogeneous and heterogeneous samples;

[0024]FIG. 12 shows a section of the p53 gene including intron, exon andsplice junction regions;

[0025]FIG. 13 shows a flowchart of a process of hybridization affinitycomparison;

[0026]FIG. 14 shows a flowchart of a process of mutation detection in aheterogeneous sample of nucleic acids;

[0027]FIG. 15 shows a flowchart of a process of testing for a deletionmutation;

[0028]FIG. 16 shows a flowchart of a process of testing for asubstitution mutation;

[0029]FIG. 17 shows homogeneous and homogeneous sample probe setintensities;

[0030]FIG. 18 shows a flowchart of a process of a substitution filter;

[0031]FIG. 19 shows a flowchart of a process of testing for asubstitution mutation; and

[0032] FIGS. 20A-20G show formulas that are utilized in a preferredembodiment.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0033] The present invention provides innovative systems and methods ofanalyzing polymers. In the description that follows, the invention willbe described in reference to a preferred embodiment that identifiesnucleotide mutations such as substitutions, insertions or deletions,such as in the p53 gene. However, the invention may be advantageouslyapplied to other polymers including peptides, polysaccharides, and thelike for various applications. Accordingly, the description is providedfor purposes of illustration and not for limiting the spirit and scopeof the invention.

[0034]FIG. 1 illustrates an example of a computer system that may beused to execute software embodiments of the present invention. FIG. 1shows a computer system 1 that includes a monitor 3, screen 5, cabinet7, keyboard 9, and mouse 11. Mouse 11 may have one or more buttons suchas mouse buttons 13. Cabinet 7 houses a CD-ROM drive 15 and a hard drive(not shown) that may be utilized to store and retrieve software programsincluding computer code incorporating the present invention or data foruse with the invention. Although a CD-ROM 17 is shown as the computerreadable medium, other computer readable media including floppy disks,DRAM, hard drives, flash memory, tape, and the like may be utilized.Cabinet 7 also houses familiar computer components (not shown) such as aprocessor, memory, and the like.

[0035]FIG. 2 shows a system block diagram of computer system 1 used toexecute software embodiments of the present invention. As in FIG. 1,computer system 1 includes monitor 3 and keyboard 9. Computer system 1further includes subsystems such as a central processor 50, systemmemory 52, I/O controller 54, display adapter 56, removable disk 58,fixed disk 60, network interface 62, and speaker 64. Removable disk 58is representative of removable computer readable media like floppies,tape, CD-ROM, removable hard drive, flash memory, and the like. Fixeddisk 60 is representative of an internal hard drive or the like. Othercomputer systems suitable for use with the present invention may includeadditional or fewer subsystems. For example, another computer systemcould include more than one processor 50 (i.e., a multi-processorsystem) or memory cache.

[0036] Arrows such as 66 represent the system bus architecture ofcomputer system 1. However, these arrows are illustrative of anyinterconnection scheme serving to link the subsystems. For example,display adapter 56 may be connected to central processor 50 through alocal bus or the system may include a memory cache. Computer system 1shown in FIG. 2 is but an example of a computer system suitable for usewith the present invention. Other configurations of subsystems suitablefor use with the present invention will be readily apparent to one ofordinary skill in the art. In one embodiment, the computer system is aworkstation from Sun Microsystems.

[0037] The VLSIPS™ technology provides methods of making very largearrays of oligonucleotide probes on very small chips. See U.S. Pat. No.5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, eachof which is hereby incorporated by reference for all purposes. Theoligonucleotide probes on the chip are used to detect complementarynucleic acid sequences in a sample nucleic acid of interest (the“target” nucleic acid).

[0038] The present invention provides methods of analyzing hybridizationaffinity or intensity data for a chip including probes that has beenexposed to a labeled polymer. In a representative embodiment, the datarepresent fluorescence intensity from a biological array, but the datamay also represent other data such as radioactive intensity. Therefore,the present invention is not limited to analyzing fluorescentmeasurements of hybridization but may be readily utilized to analyzeother measurements of hybridization.

[0039] For purposes of illustration, a computer system that designs achip mask, synthesizes the probes on the chip, labels the nucleic acids,and scans the hybridized nucleic acid probes will be described. Such asystem is fully described in U.S. patent application Ser. No.08/249,188, which is hereby incorporated by reference for all purposes.The present invention may be used within such a system, in anothersystem, or separately for analyzing data, such as at remote locations.

[0040]FIG. 3 illustrates a computerized system for forming and analyzingarrays of biological materials. A computer 100 is used to design arraysof biological polymers such as RNA or DNA. The computer may be, forexample, an appropriately programmed IBM compatible personal computerrunning Windows NT including appropriate memory and a CPU as shown inFIGS. 1 and 2. Computer system 100 obtains inputs from a user regardingcharacteristics of a gene of interest, and other inputs regarding thedesired features of the array. Optionally, the computer system mayobtain information regarding a specific genetic sequence of interestfrom an external or internal database 102 such as GenBank. The output ofcomputer system 100 is a set of chip design computer files 104 in theform of, for example, a switch matrix, as described in PCT applicationWO 92/10092, and other associated computer files.

[0041] The chip design files are provided to a system or process 106that designs the lithographic masks used in the fabrication of arrays ofmolecules such as DNA. System or process 106 may include the hardwarenecessary to manufacture masks 110 and also the necessary computerhardware and software 108 necessary to lay the mask patterns out on themask in an efficient manner. As with the other features in FIG. 3, suchequipment may or may not be located at the same physical site, but isshown together for ease of illustration in FIG. 3. System or process 106generates masks 110 or other synthesis patterns such as chrome-on-glassmasks for use in the fabrication of polymer arrays.

[0042] Masks 110, as well as selected information relating to the designof the chips from computer system 100, are used in a synthesis system112. Synthesis system 112 includes the necessary hardware and softwareused to fabricate arrays of polymers on a substrate or chip 114. Forexample, synthesizer 112 includes a light source 116 and a chemical flowcell 118 on which the substrate or chip 114 is placed. Mask 110 isplaced between the light source and the substrate/chip, and the two aretranslated relative to each other at appropriate times for deprotectionof selected regions of the chip. Selected chemical reagents are directedthrough flow cell 118 for coupling to deprotected regions, as well asfor washing and other operations. All operations are preferably directedby an appropriately programmed computer 119, which may or may not be thesame computer as the computer(s) used in mask design and mask making.

[0043] The substrates fabricated by synthesis system 112 are optionallydiced into smaller chips and exposed to marked targets. The targets mayor may not be complementary to one or more of the molecules on thesubstrate. The targets are marked with a label such as a fluoresceinlabel (indicated by an asterisk in FIG. 3) and placed in a scanningsystem 120. Scanning system 120 again operates under the direction of anappropriately programmed digital computer 122, which also may or may notbe the same computer as the computers used in synthesis, mask making,and mask design.

[0044] Scanner 120 includes a detection device 124 such as a confocalmicroscope or CCD (charge-coupled device) that is used to detect thelocation where labeled target (*) has bound to the substrate. The outputof scanner 120 is an image file(s) 124 indicating, in the case offluorescein labeled target, the fluorescence intensity (photon counts orother related measurements, such as voltage) as a function of positionon the substrate. Since higher photon counts will be observed where thelabeled target has bound more strongly to the array of polymers, andsince the monomer sequence of the polymers on the substrate is known asa function of position, it becomes possible to determine the sequence(s)of polymer(s) on the substrate that are complementary to the target.

[0045] Image file 124 may be provided as input to an analysis system 126that incorporates embodiments of the present invention. Again, theanalysis system may be any one of a wide variety of computer system. Thepresent invention provides systems and methods of analyzinghybridization data, which may include chip design files and image files,and providing appropriate output 128. As an example, the presentinvention may be used to determine the position of mutations in a sampleof DNA or RNA.

[0046]FIG. 4 provides a simplified illustration of the overall softwaresystem used in the operation of one embodiment of the invention. Asshown in FIG. 4, the system first identifies the genetic sequence(s) ortargets that would be of interest in a particular analysis at a step202. The sequences of interest may, for example, be normal or mutantportions of a gene, genes that identify heredity, provide forensicinformation, genes for cancer detection, or pathology. Sequenceselection may be provided via manual input of text files or may be fromexternal sources such as GenBank. At a step 204 the system evaluates thegene to determine or assist the user in determining which probes wouldbe desirable on the chip, and provides an appropriate “layout” on thechip for the probes.

[0047] The chip usually includes probes that are complementary to areference nucleic acid sequence, which has a known sequence. A wild-typeprobe is a probe that will ideally hybridize with the reference sequenceand thus a wild-type gene (also called the chip wild-type) would ideallyhybridize with wild-type probes on the chip. The sample or targetsequence is typically similar to the reference sequence except for thepresence of substitutions, insertions, deletions, and the like. Thelayout implements desired characteristics such as arrangement on thechip that permits “reading” of genetic sequence and/or minimization ofedge effects, ease of synthesis, and the like.

[0048] In order to better understand a layout of a chip, FIG. 5illustrates the global layout of a chip. Chip 114 is composed ofmultiple units where each unit may contain different tilings for thewild-type sequence or multiple wild-type sequences. Unit 1 is shown ingreater detail and shows that each unit is composed of multiple cells,which are areas on the chip that may contain probes. Conceptually, eachunit includes multiple sets of related cells. As used herein, the term“cell” refers to a region on a substrate that contains many copies of amolecule or molecules (e.g., nucleic acid probes).

[0049] Each unit is composed of multiple cells that may be placed inrows (or “lanes”) and columns. In one embodiment, a set of five relatedcells includes the following: a wild-type cell 220, “mutation” cells222, and a “blank” cell 224. Cell 220 contains a wild-type probe that isthe complement of a portion of the wild-type sequence. Cells 222 contain“mutation” probes for the wild-type sequence. For example, if thewild-type probe is 3′-ACGT, the probes 3′-ACAT, 3′-ACCT, 3′-ACGT, and3′-ACTT may be the “mutation” probes. Cell 224 is the “blank” cellbecause it contains no probes (also called the “blank” probe). As theblank cell contains no probes, labeled targets should not bind to thechip in this area. Thus, the blank cell provides an area that can beused to measure the background intensity. In preferred embodiments,there is only one cell for the wild-type probes.

[0050] Referring again to FIG. 4, at a step 206 the masks for thesynthesis are designed. At a step 208 the software utilizes the maskdesign and layout information to make the DNA or other polymer chips.This software 208 will control, among other things, relative translationof a substrate and the mask, the flow of desired reagents through a flowcell, the synthesis temperature of the flow cell, and other parameters.At a step 210, another piece of software is used in scanning a chip thussynthesized and exposed to a labeled target. The software controls thescanning of the chip, and stores the data thus obtained in a file thatmay later be utilized to extract sequence information.

[0051] At a step 212 a computer system utilizes the layout informationand the fluorescence information to evaluate the hybridized nucleic acidprobes on the chip. Among the important pieces of information obtainedfrom DNA chips are the identification of mutant targets anddetermination of genetic sequence of a particular target.

[0052]FIG. 6 illustrates the binding of a particular target DNA to anarray of DNA probes 114. As shown in this simple example, the followingprobes are formed in the array (only one probe is shown for thewild-type probe):

[0053] 3′-AGAACGT

[0054] AGACCGT

[0055] AGAGCGT

[0056] AGATCGT

[0057] •

[0058] •

[0059] •

[0060] As shown, the set of probes differ by only one base, a singlebase mismatch at an interrogation position, so the probes are designedto determine the identity of the base at that location in the nucleicacid sequence. Accordingly, when used herein a unit will refer tomultiple sets of related probes, where each set includes probes thatdiffer by a single base mismatch at an interrogation position.

[0061] When a fluorescein-labeled (or otherwise marked) target with thesequence 5′-TCTTGCA is exposed to the array, it is complementary only tothe probe 3′-AGAACGT, and fluorescein will be primarily found on thesurface of the chip where 3′-AGAACGT is located. Thus, for each set ofprobes that differ by only one base, the image file will contain fourfluorescence intensities, one for each probe. Each fluorescenceintensity can therefore be associated with the nucleotide or base ofeach probe that is different from the other probes. Additionally, theimage file will contain a “blank” cell that can be used as thefluorescence intensity of the background. By analyzing the fluorescenceintensities associated with a specific base location, it becomespossible to extract sequence information from such arrays using themethods of the invention disclosed herein.

[0062]FIG. 7 illustrates probes arranged in lanes on a chip. A referencesequence (or chip wild-type sequence) is shown with five interrogationpositions marked with number subscripts. An interrogation position isoftentimes a base position in the reference sequence where the targetsequence may contain a mutation or otherwise differ from the referencesequence. The chip may contain five probe cells that correspond to eachinterrogation position. Each probe cell contains a set of probes thathave a common base at the interrogation position. For example, at thefirst interrogation position, I₁, the reference sequence has a base T.The wild-type probe for this interrogation position is 3′-TGAC where thebase A in the probe is complementary to the base at the interrogationposition in the reference sequence.

[0063] Similarly, there are four “mutant” probe cells for the firstinterrogation position, II. The four “mutant” probes are 3′-TGAC,3′-TGCC, 3′-TGGC, and 3′-TGTC. Each of the four “mutant” probes variesby a single base at the interrogation position. As shown, the wild-typeand “mutant” probes are arranged in lanes on the chip. One of the“mutant” probes (in this case 3′-TGAC) is identical to the wild-typeprobe and therefore does not evidence a mutation. However, theredundancy may be utilized to give a visual indication of substitutionmutations as will be seen in FIG. 8.

[0064] Still referring to FIG. 7, the chip contains wild-type and“mutant” probes for each of the other interrogation positions I₂-I₅. Ineach case, the wild-type probe is equivalent to one of the “mutant”probes.

[0065]FIG. 8 illustrates a hybridization pattern of a target on a chipwith a reference sequence as in FIG. 7. The reference sequence is shownalong the top of the chip for comparison. The chip includes a WT-lane(wild-type), an A-lane, a C-lane, a G-lane, and a T-lane (or U). Eachlane is a row of cells containing probes. The cells in the WT-lanecontain probes that are complementary to the reference sequence. Thecells in the A-, C-, G-, and T-lanes contain probes that arecomplementary to the reference sequence except that the named base is atthe interrogation position.

[0066] In one embodiment, the hybridization of probes in a cell isdetermined by the fluorescent intensity (e.g., photon counts) of thecell resulting from the binding of marked target sequences. Thefluorescent intensity may vary greatly among cells. For simplicity, FIG.8 shows a high degree of hybridization by a cell containing a darkenedarea. The WT-lane allows a simple visual indication that there is amutation at interrogation position 14 because the wild-type cell is notdark at that position. The cell in the C-lane is darkened whichindicates that the mutation is from T->G (the probes are complementaryso the C-cell indicates a G mutation). In a preferred embodiment, theWT-Lane is not utilized so four cells (not including any “blank” cell)are utilized to call a base at an interrogation position.

[0067] In practice, the fluorescent intensities of cells near aninterrogation position having a mutation are relatively dark creating“dark regions” around a mutation. The lower fluorescent intensitiesresult because the cells at interrogation positions near a mutation donot contain probes that are perfectly complementary to the targetsequence; thus, the hybridization of these probes with the targetsequence is lower. For example, the relative intensity of the cells atinterrogation positions I₃ and I₅ may be relatively low because none ofthe probes therein are complementary to the target sequence. Althoughthe lower fluorescent intensities reduce the resolution of the data, themethods of the present invention provide highly accurate base callingwithin the dark regions around a mutation and are able to identify othermutations within these regions.

[0068]FIG. 9 illustrates standard and standard variant tilings on achip. As shown, the chip includes twelve probe sets (probe sets 1-14).The odd probe sets are include sense probes and the even probe sets(indicated by the cross hatching) include anti-sense probes. Probe sets1 and 2 are tiled (i.e., designed and synthesized on the chip) toinclude probes complementary to the reference sequence, typically with asubstitution position near the middle of the probe. In order to increasethe accuracy of the analysis, preferred embodiments include standardvariant tilings (shown as probe sets 3-14). Probes in the standardvariant tilings are also complementary to the reference sequence;however, the probes have a substitution position and/or length thatdiffers from the probes in the standard tiling. Each position mayinclude one to six pairs of standard variant tiling probe sets, whichmay be varied accordingly to how likely it is believed that there may bea mutation at that position. Although twelve standard variant tilingprobe sets are shown, the number may be varied as desired.

[0069] The expanded section at the bottom left portion of FIG. 9illustrates that each block of a probe set typically includes fourcells, denoted A, C, G, and T. The probe set may also include a cell fordetecting deletion mutations (i.e., the interrogation position base isabsent) and/or a “blank” cell for determining background intensity. Thebase designations specify which base is at the interrogation position ofeach probe within the cell. Typically, there are hundreds or thousandsof identical nucleic probes within each cell.

[0070] Although in preferred embodiments the cells may be arrangedadjacent to each other in sequential order along the reference sequence,there is no requirement that the cells be in any particular location aslong as the location on the chip is determinable. Additionally, althoughit may be beneficial to synthesize the different groups on a single chipfor consistency of experiments, the methods of the present invention maybe advantageously utilized with data from different tilings on differentchips.

[0071] Embodiments of the invention may be utilized to detect monomerchanges in a heterogeneous sample when an unknown quantity of wild-typemonomers may also be present. For example, mutations in the p53 genehave been identified as a potential prelude to some cancers. Tissuesamples from a tumor will typically include a cellular mixture so itwould be beneficial to identify mutations in the nucleic acid sequencesof the mixture in the presence of wild-type nucleotides. The followingwill describe embodiments that analyze heterogeneous samples includingnucleic acid sequences to detect mutations in the p53 gene. However, theinvention is not limited to this application and may be advantageouslyapplied to analyzing other genes and different types of sequences (e.g.,peptides) as examples.

[0072] In order to detect mutations in a heterogeneous sample of nucleicacid sequences, embodiments of the invention compare the hybridizationaffinity between a homogeneous sample and a set of probes to thehybridization affinity between the heterogeneous sample and a set ofprobes. A homogeneous sample includes primarily one nucleic acidsequence (the reference sequence) or fragments thereof. There may besmall concentrations of test sequences that have been added for qualitycontrol purposes, but the sample is considered to be homogeneous. Theheterogeneous sample includes the reference sequence and mutations ofthat sequence, be it a substitution, deletion, insertion, or multiplebase deletion.

[0073] Typically, the probes for analyzing the homogeneous andheterogeneous samples are the same, but this is not required. Asdiscussed above, the homogeneous sample is utilized as a reference foranalyzing the heterogeneous sample. The homogeneous and heterogeneoussamples are preferably hybridized to probes on a chip under the sameconditions. In preferred embodiments, the homogeneous sample includeswild-type nucleic acid sequences and the probes are tiled on a chip forthese wild-type nucleic acid sequences.

[0074] In order to illustrate one process of detecting mutations, FIG.10 shows a bar graph including hybridization affinity of a homogeneoussample (or “reference”) and a heterogeneous sample (here designated as“sample”). In this example, the homogeneous sample includes sequenceshaving a wild-type base at the position being analyzed so it is expectedthat the hybridization affinity of the reference sequences to the probethat includes the wild-type base would be highest. The bar graph showsthat the hybridization affinity of the probes that includes thewild-type base (i.e., a C at this position) is by far the highest. Thehybridization affinities shown are fairly typical and it should be notedthat the hybridization affinities of the other probes are not zero. Thismay be due the specific interactions of the nucleotides,cross-hybridization or other reasons.

[0075] The shaded bars in FIG. 10 represent the hybridization affinityof an heterogeneous sample to the same probes. The heterogeneous sampleincludes nucleic acid sequences that are similar to the referencesequences, but there may be mutations present. As shown, thehybridization affinities of the heterogeneous sample are similar to thehybridization affinities of the homogeneous sample. However, thehybridization affinity of wild-type probe decreased slightly while thehybridization of the probe having a T at the interrogation positionincreased. This may indicate that some of the sample sequences have amutation (i.e., a substitution to A since the probes are complementaryto the sequences) at the position being analyzed.

[0076]FIG. 11 shows a flowchart of a process that analyzes hybridizationaffinities for reference and heterogeneous samples, such as for thedetection of mutations. The flowchart provides the high level flow ofmixture analysis and specific details of preferred embodiments will beprovided in the following figures and description. At a step 301,hybridization affinities for a homogeneous sample are received by acomputer system. The hybridization affinities may be represented byphoton counts from a fluorescein marker that are stored in a file. Thefile may be obtained by conventional mechanisms such as over a networkor on a removable storage device (e.g., CD-ROM).

[0077] At a step 303, the computer system receives hybridizationaffinities for a heterogeneous sample. The hybridization affinities forthe heterogeneous sample will typically be stored in a way similar tothe hybridization affinities for the homogeneous sample. After thehybridization affinities for the reference and heterogeneous sample arereceived, the system compares the hybridization affinities of thereference and heterogeneous samples. There are many different ways thatthe hybridization affinities may be compared including the way describedin reference to FIG. 10 (i.e., detecting a decrease in the wild-typeprobe affinity and an increase in a non-wild-type probe affinity in theheterogeneous sample). However, the details of other ways of comparingthe hybridization affinities will be described in reference to laterfigures.

[0078] The system compares the hybridization affinities to identify theone or more monomers at a position in the sequences of the heterogeneoussample at a step 307. As an example, if the system detects asubstitution mutation at a position, the system may indicate this to theuser by “C/T,” which means that a mutation to C was detected in thesample and the wild-type base is T. If the system does not detect amutation, the system may indicate this to the user with a “T” for thewild-type base.

[0079] Although the invention may be utilized in many applications,detecting mutations in the p53 gene of a heterogeneous sample will bedescribed herein. FIG. 12 shows a section of the p53 gene. As shown,along the p53 gene are different regions including introns, exons andsplice junctions. Chips may be designed that include probes for the cDNAregions (i.e., the exon cores and splice junctions), “genomic regions”between the introns, both, the whole gene, or any other parts of thegene. When analyzing the hybridization affinities, the system maydetermine if data for a region is acceptable. For simplicity, thefollowing will describe checking data for an exon region. However, theregion may be any region or set of regions on the gene.

[0080] Now that a process of mixture analysis and chip design has beendescribed, a process hybridization affinity comparison will bedescribed. FIG. 13 shows a flowchart of a process of hybridizationaffinity analysis. The flowchart is one embodiment of step 305 of FIG.11. At a step 351, the system tests regions for acceptable data. Asdiscussed earlier, the regions may be exon regions. The system maydetermine if the hybridization affinities in a region are acceptable andif they are not, the system may not analyze any of the individual sitesor positions in the region. For example, if more than a predeterminednumber of probe sets (see discussion of FIG. 9) do not have enoughdiscrimination between wild-type probes and non-wild-type probes in theregion, the system may deem the data for the region unacceptable.

[0081] At a step 453, the system tests the individual sites foracceptable data. For example, the system may subtract a backgroundintensity (e.g., derived from a “blank” probe) from each of theintensities for each probe of a probe set. If the background subtractedintensities of the probes are not all above a minimum threshold, thesystem may deem the data from the probes in the probe set areunacceptable.

[0082] If the region has been determined to have acceptable data andsome data at a site is deemed acceptable, the system can perform a testfor a deletion at a step 355. In order to test for a deletion, a probeis synthesized on the chip that would be complementary to a deletion.For example, referring back to FIG. 7, the four probes are 3′-TGAC,3′TGCC, 3′-TGGC, and 3′-TGTC, where the interrogation position isunderlined. In order to test for a deletion at this interrogationposition, a probe 3′-TGC is synthesized on the chip. In practice, thelengths of the probes are typically longer (e.g., 12-mers to 15-mers),but the shorter probes are used herein for illustrative purposes.

[0083] Each probe set at a site or position is analyzed to determine ifthe probe set indicates that there has been a deletion mutation at thisposition. If the number of probe sets that indicate there has been adeletion exceeds a threshold, the system may indicate that there hasbeen a deletion at this position.

[0084] At a step 457, the system performs a test for a substitution.Assuming the region has been determined to have acceptable data and somedata at the site is deemed acceptable, the system analyzes thehybridization affinities of the probes of each probe set to determine ifthe probe set indicates that there was a substitution mutation. If morethan a predetermined number of probe sets agree that there has been asubstitution, the system may indicate that there has been a substitutionat this position.

[0085] The probe sets can include probes to test for other mutationsincluding insertions and multiple-base deletions. Accordingly, theflowchart of FIG. 13 can include steps for testing sites for insertions,multiple base deletions, and the like. Insertion mutations are detectedby analyzing probe sets that have been tiled on the chip for detectingan insertion at a specific position. For example, there may be fourinsertion probes that include a different base that has been addedbetween two adjacent bases in the reference sequence. A determination ofwhether there has been an insertion may be based on whether apredetermined number of probe sets agree that there has been aninsertion. Multiple base deletion probes are similar to the singledeletion probe described above except that more than one base has beendeleted. Chips can be synthesized that include probes for deletions,insertions and multiple base deletions for each site or only atdesignated sites.

[0086] The preceding description has described the invention but it maybe beneficial to describe a preferred embodiment of the invention indetail. FIGS. 20A-20G show formulas that are utilized in a preferredembodiment. These formulas will be described in reference to flowchartsthat illustrate this embodiment. Unless otherwise indicated, thehybridization intensities of the probes are background subtracted.

[0087]FIG. 14 shows a flowchart of a process of mutation detection in aheterogeneous sample of nucleic acids. The flowchart begins after therelevant hybridization affinity data has been input into the system. Thehybridization affinity data includes the probe sequence and thehybridization affinity (or intensity) for the probe, which may becalculated as the mean of the photon counts from a cell that includesthe probe. In preferred embodiments, the hybridization affinity data forthe reference and heterogeneous samples were obtained under the sameconditions.

[0088] For simplicity, the flowchart will describe a process ofdetecting mutations in the multiple sites of an exon. It should bereadily understood that the process may be extended to analyze multipleexons or different regions altogether.

[0089] At a step 401, the system performs an exon quality test. Thepurpose of the exon quality test is to detect and eliminate fromanalysis an exon that has hybridization affinity data that will likelyhave a high error rate. The exon quality test the degree to whichhybridization intensity values discriminate between the wild-type probeand the three non-wild-type probes in a probe set. It has beendetermined that less discrimination results in higher error rates forthe exon and it may be that the error rate increases exponentially withdecreasing discrimination.

[0090] With the exon quality test, a DiscQualityFilter value iscalculated (see FIG. 20C). In order to calculate the DiscQualityFiltervalue, a ratio of the hybridization affinity of the wildtype probe tothe average of the hybridization affinities of the non-wild-type probesis calculated for each probe set. The average of the ratios for eachprobe set is calculated to produce the DiscQualityFilter value. Probesets that include one or more probes that have a hybridization affinitylower than a background intensity may be excluded from calculating theDiscQualityFilter value.

[0091] In general, the higher the DiscQualityFilter value, the lower theerror rates for the exon are expected. For each exon, theDiscQualityFilter value is compared to an ExonIntDiscCutoff value and ifthe DiscQualityFilter value is less than the ExonIntDiscCutoff value,the hybridization affinity data for the exon fails and is deemedunacceptable. Otherwise, the hybridization affinity data for the exon isdeemed acceptable. Each exon may have a different ExonIntDiscCutoffvalue, which may be determined empirically.

[0092] At a step 403, the system gets probe set data for a site. It isthen determined if the site is located in an exon with acceptable dataat a step 405. The determination of whether the exon has acceptablehybridization affinity data was calculated at step 401, which wouldtypically perform the exon quality test for all the exons of interest.If the probe set is for a site that is located in an exon withunacceptable data, the site is called as unknown or “N.”

[0093] Otherwise, if the probe set is for a site that is located in anexon with acceptable data, the system performs a site quality test at astep 409. The purpose of the site quality test is to remove probe setsthat do not have acceptable data quality from the site calculation. If aprobe set for the homogeneous sample is deemed to have unacceptabledata, the corresponding probe set for the heterogeneous sample is alsoremoved, and vice versa.

[0094] Probe sets will be removed from analysis of the reference andheterogeneous samples by the site quality test if any one of fourconditions is true. The first condition is if RefMaxInt is less thanIntCutoff. UKMaxInt is the maximum hybridization intensity of a probe inthe reference probe set (see FIG. 20C). If this maximum is less than apredetermined threshold IntCutoff, then the probe sets are removed. Thesecond condition is if UKMaxInt is less than IntCutoff UKMaxInt issimilar to RefMaxInt and is the maximum hybridization intensity of aprobe in the sample probe set. If this maximum is less than IntCutoff,then the probe sets are removed.

[0095] The third condition is if RefIntDisc is less than MinIntDisc.This condition tests the intensity discrimination of the reference probeset. The RefIntDisc value is the ratio of the raw hybridization affinityof the wild-type probe (i.e., not background subtracted) to the averageof the raw hybridization affinities of the non-wild-type probes. IfRefIntDisc is less than a predetermined MinIntDisc, then the probe setsare removed. The fourth condition is if VectorRatio is greater thanMaxVectorRatio, which is a predetermined value. This condition testswhether the magnitude of the vector formed by the four hybridizationintensities of the probe set do not differ above a threshold between thereference and unknown (see FIGS. 20B and 20C). If VectorRatio is greaterthan Max VectorRatio, then the probe sets are removed.

[0096] After the site quality test is performed, it is determined if thenumber of probe sets remaining is greater than zero at a step 411. Ifnot, the site is called an unknown or “N” at a step 413. Otherwise, thesystem performs a test for a deletion mutation at a step 415. The testfor a deletion mutation is shown in more detail in FIG. 15.

[0097]FIG. 15 shows a flowchart of a process of testing for a deletionmutation. At a step 501, a deletion filter is performed. The deletionfilter calculates a delRatio for each probe set that passed the sitequality test. The calculation for delRatio is shown in FIG. 20C and ifthe value is greater than zero, the probe set indicates that there is adeletion. If the number of probe sets that make a deletion mutant callwith the deletion filter does not exceed a predetermined threshold at astep 503, the site is given a deletion score of zero at a step 505,meaning that a deletion mutation has not been indicated and the sitewill be tested for a substitution mutation. Otherwise, the similarityfilter is performed at a step 507.

[0098] The similarity filter tests whether any of the sample probe setshave the “same” intensity pattern as that of any of the reference probesets. The rational is that random experimental variation may causedifferences in the intensity patterns. Therefore, it would be anonrandom event if both the reference and sample probes sets have a verynearly identical hybridization pattern. Such an event would likely onlybe caused by a wild-type base at the interrogation position. The testfor the same hybridization pattern may be computed by a dot vectorbetween the four reference intensities and the four sample intensities.If the similarity of any of the probe set pairs is greater than acutoff, the site does not pass the similarity filter at a step 509 andwill be tested for a substitution mutation (by setting the deletionscore to zero at a step 505). In preferred embodiments, thehybridization intensity patterns should be very near identical beforethey fail the similarity filter.

[0099] At a step 511, the system calculates a deletion score. Thedeletion score for each probe set is the sum of two “mixture variables”:dot metric and dRatio. The dot metric is correlated with increasingdifferences in the probe set intensities of the sample relative to thereference, but does not quantitate any specific pattern of differences(see FIG. 20E). The dRatio variable is correlated with the degree towhich a non-wild-type probe intensity increases while the wild-typeprobe intensity decreases in the sample, relative to the reference (seeFIG. 20E). The sum of dot metric and dRatio is the deletion score, inwhich generally a higher deletion score indicates a greater likelihoodof a deletion mutation.

[0100] Returning to FIG. 14, the deletion score is compared to adeletion cutoff at a step 417. If the deletion score is greater than thedeletion cutoff, the site is called as a deletion at a step 419. Forexample, the site may be called as “-/T,” where the dash indicates adeletion and the T indicates the wild-type base. In a preferredembodiment, the deletion cutoff varies depending on the number of probesets that pass the site quality test.

[0101] If the deletion score is not greater than the deletion cutoff atstep 417, the system performs a test for a substitution mutation at astep 421. The test for a substitution mutation is shown in more detailin FIG. 16.

[0102]FIG. 16 shows a flowchart of a process of testing for asubstitution mutation. At a step 601, a substitution filter isperformed. The substitution filter produces three ratios for each probeset that passed the site quality test. Each of the three ratios isproduced by dividing the wild-type probe intensity by a non-wild-typeprobe intensity. For example, FIG. 17 shows reference and sample probeset intensities. The wild-type probe intensity is designed “WT” and thenon-wild-type probe intensities are designate “P1,” “P2” and “P3.” Theratios WT/P1, WT/P2 and WT/P3 are calculated for each probe set.

[0103] When the fraction of non-wild-type base relative to wild-typebase at a site increases, the intensity of one of the non-wild-typeprobes increases while the intensity of the wild-type probe decreases.Therefore, the presence of a substitution mutation will typicallydecrease one of the three ratios for the sample relative to the sameratio for the reference. A probe set may indicate that the basespecified by the non-wild-type probe with the greatest decrease (if any)in the ratios if the probe set also passes tests for the “shape” of theintensity pattern differences as described in reference to FIG. 18.

[0104]FIG. 18 shows a flowchart of a process of a substitution filter.At a step 651, the system computes a mutRatio for each non-wild-typeprobe. The mutRatio is a ratio of wild-type and non-wild-typeintensities from the reference and sample (see FIG. 20C). The higher thevalue, the more likely there is a substitution mutation.

[0105] At a step 653, the system sorts the three mutRatio values indescending order and rename the values so that1mutRatio>2mutRatio>3mutRatio (i.e., 1mutRatio is the highest value).The system then calculates the mutRatioDiff at a step 655, which is thedifference between 1mutRatio and 2mutRatio. There are two testsperformed at a step 657 to make a putative base call. If either test ispassed, the probe set indicates that the site is a substitutionmutation. The two tests, Test₁, and Test₂, are shown in FIG. 20G. Ingeneral, Test₁ requires more probe sets to agree on the call but has aless stringent “shape” requirement to call a putative mutant call thanTest₂. If both tests fail, the probe set is treated as indicating thatthe site is wild-type.

[0106] If the number of probe sets that make a substitution mutant callwith the substitution filter does not exceed a predetermined thresholdat a step 603, the site is given a substitution score of zero at a step605, meaning the site will be called as wild-type. Otherwise, thesimilarity filter is performed at a step 607.

[0107] The similarity filter tests whether any of the sample probe setshave the “same” intensity pattern as that of any of the reference probesets. The similarity filter may be the same as described in reference tostep 507 in FIG. 15. If the similarity of any of the probe set pairs isgreater than a cutoff, the site does not pass the similarity filter at astep 609 and will be called as wild-type (by setting the substitutionscore to zero at step 605). As mentioned earlier, in preferredembodiments, the hybridization intensity patterns should be very nearidentical before they fail the similarity filter.

[0108] At a step 611, the system calculates a substitution score. Thesubstitution score for each probe set is the sum of four “mixturevariables”: dot metric, dRatio, DneighborRatio, and rank. The dot metricis correlated with increasing differences in the probe set intensitiesof the sample relative to the reference, but does not quantitate anyspecific pattern of differences (see FIG. 20E). The dRatio variable iscorrelated with the degree to which a non-wild-type probe intensityincreases while the wild-type probe intensity decreases in the sample,relative to the reference (see FIG. 20E).

[0109] The DNeighborRatio variable is correlated with the degree towhich the intensities of neighboring probe sets decrease, relative tothe reference (see FIG. 20F). The rank variable is a binary metric whichis set to 1 when the highest intensity probe in the sample is not thesame as the highest intensity probe in the reference (see FIG. 20F). Thesum of dot metric, dRatio, DNeighborRatio, and rank is the substitutionscore, in which generally a higher substitution score indicates agreater likelihood of a substitution mutation.

[0110] Returning to FIG. 14, the substitution score is compared to asubstitution cutoff at a step 423. If the substitution score is greaterthan the substitution cutoff, the site is called as a substitution at astep 425. For example, the site may be called as “G/A,” where theindicates G the substitution mutation and the A indicates the wild-typebase. In a preferred embodiment, the substitution cutoff variesdepending on the number of probe sets that pass the site quality test.

[0111] Although not shown in FIG. 14, a system can test for othermutations including insertions and multiple base deletions. Theflowcharts for these mutation conditions may be similar to the onesalready shown but FIG. 19 shows a flowchart of a process of testing fora insertion mutation.

[0112] At a step 671, an insertion filter is performed. The insertionfilter calculates four ratios for each probe set that passed the sitequality test. The calculation for each ratio is the same as the ratiosdescribed in reference to FIG. 17 except that four ratios WT/I1, WT/I2,WT/I3, and WT/I4, where I1-I4 represent the four insertion probes, arecalculated. The presence of an insertion will typicall7y increase on thefour ratios for the sample relative to the same ratio for the reference.If the number of probe sets that make an insertion mutant call with theinsertion filter does not exceed a predetermined threshold at a step673, the site is given an insertion score of zero at a step 675, meaningthat an insertion mutation has not been indicated.

[0113] A similarity filter is performed at a step 677. The similarityfilter can be the same as described in reference to step 507 of FIG. 15.If the similarity of any of the probe set pairs is greater than acutoff, the site does not pass the similarity filter at a step 679 andwill be called as wild-type (by setting the insertion score to zero atstep 675).

[0114] At a step 681, the system calculates an insertion score. Thedeletion score for each probe set is the sum of three “mixturevariables”: dot metric, dRatio and dNeighborRatio. The dot metric iscorrelated with increasing differences in the probe set intensities ofthe sample relative to the reference, but does not quantitate anyspecific pattern of differences (see FIG. 20E). The dRatio variable iscorrelated with the degree to which a non-wild-type probe intensityincreases while the wild-type probe intensity decreases in the sample,relative to the reference (see FIG. 20E). The DNeighborRatio variable iscorrelated with the degree to which the intensities of neighboring probesets decrease, relative to the reference (see FIG. 20F). The sum of dotmetric, dRatio and dNeighborRatio is the insertion score, in whichgenerally a higher insertion score indicates a greater likelihood of aninsertion mutation.

[0115] With the present invention, mutations may be detected in amixture of nucleic acid sequences in the presence of an unknown quantityof wild-type bases. Although the above description has describedpreferred embodiments, many variations of the invention will becomeapparent to those of skill in the art upon review of this disclosure.Merely by way of example, while the invention is illustrated primarilywith regard to nucleic acid sequences, the invention may beadvantageously applied to other polymers. The scope of the inventionshould, therefore, be determined not with reference to the abovedescription, but instead should be determined with reference to theappended claims along with their fall scope of equivalents. CLAIMS

What is claimed is:
 1. A method of analyzing a heterogeneous sample ofnucleic acids, comprising: receiving hybridization affinities of ahomogeneous sample of nucleic acids to a plurality of nucleic acidprobes; receiving hybridization affinities of a heterogeneous sample ofnucleic acids to the plurality of nucleic acid probes; and comparinghybridization affinities of the homogeneous and heterogeneous samples toidentify a mutation in the heterogeneous sample.
 2. The method of claim1 , wherein the plurality of nucleic acid probes includes at least onewild-type probe and non-wild-type probe.
 3. The method of claim 2 ,wherein the comparing hybridization affinities includes: calculating afirst ratio of a hybridization affinity of a non-wild-type probe to ahybridization affinity of a wild-type probe for the homogeneous sampleof nucleic acids; and calculating a second ratio of a hybridizationaffinity of a non-wild-type probe to a hybridization affinity of awild-type probe for the heterogeneous sample of nucleic acids.
 4. Themethod of claim 3 , wherein the mutation in the heterogeneous sample isidentified if the first ratio is less than the second ratio.
 5. Themethod of claim 1 , wherein the mutation is a substitution, deletion orinsertion.
 6. The method of claim 1 , wherein the comparinghybridization affinities includes testing a region of the nucleic acidsfor acceptable data.
 7. The method of claim 1 , wherein the comparinghybridization affinities includes testing sites of the nucleic acids foracceptable data.
 8. The method of claim 1 , wherein the comparinghybridization affinities includes testing sites of the nucleic acids fora deletion.
 9. The method of claim 1 , wherein the comparinghybridization affinities includes testing sites of the nucleic acids fora substitution.
 10. The method of claim 1 , wherein the comparinghybridization affinities includes testing sites of the nucleotides foran insertion.
 11. A computer program product for analyzing aheterogeneous sample of nucleic acids, comprising: computer code thatreceives hybridization affinities of a homogeneous sample of nucleicacids to a plurality of nucleic acid probes; computer code that receiveshybridization affinities of a heterogeneous sample of nucleic acids tothe plurality of nucleic acid probes; computer code that compareshybridization affinities of the homogeneous and heterogeneous samples toidentify a mutation in the heterogeneous sample; and a computer readablemedium that stores the computer codes.
 12. The computer program productof claim 11 , wherein the computer readable medium is a floppy, tape,CD-ROM, hard drive, or flash memory.
 13. A method of analyzing aheterogeneous sample of nucleic acids, comprising: receivinghybridization affinities of a homogeneous sample of nucleic acids to aplurality of nucleic acid probes, the plurality of nucleic acid probesincluding a wild-type probe and at least one non-wild-type probe;receiving hybridization affinities of a heterogeneous sample of nucleicacids to the plurality of nucleic acid probes; calculating a first ratioof a hybridization affinity of a wild-type probe to a hybridizationaffinity of a non-wild-type probe for the homogeneous sample of nucleicacids; calculating a second ratio of a hybridization affinity of awild-type probe to a hybridization affinity of a non-wild-type probe forthe heterogeneous sample of nucleic acids; calculating a third ratio ofthe difference between the first and second ratios to the first ratio;and determining there is a mutation in the heterogeneous sample if thethird ratio is above a predetermined threshold, the mutation beingidentified by the non-wild-type probe.
 14. The method of claim 13 ,wherein the mutation is a substitution, deletion or insertion.
 15. Themethod of claim 13 , further comprising testing a region of the nucleicacids for acceptable data.
 16. The method of claim 13 , furthercomprising testing sites of the nucleic acids for acceptable data.
 17. Acomputer program product for analyzing a heterogeneous sample of nucleicacids, comprising: computer code that receives hybridization affinitiesof a homogeneous sample of nucleic acids to a plurality of nucleic acidprobes, the plurality of nucleic acid probes including a wild-type probeand at least one non-wild-type probe; computer code that receiveshybridization affinities of a heterogeneous sample of nucleic acids tothe plurality of nucleic acid probes; computer code that calculates afirst ratio of a hybridization affinity of a wild-type probe to ahybridization affinity of a non-wild-type probe for the homogeneoussample of nucleic acids; computer code that calculates a second ratio ofa hybridization affinity of a wild-type probe to a hybridizationaffinity of a non-wild-type probe for the heterogeneous sample ofnucleic acids; computer code that calculates a third ratio of thedifference between the first and second ratios to the first ratio;computer code that determines there is a mutation in the heterogeneoussample if the third ratio is above a predetermined threshold, themutation being identified by the non-wild-type probe; and a computerreadable medium that stores the computer codes.
 18. The computer programproduct of claim 17 , wherein the computer readable medium is a floppy,tape, CD-ROM, hard drive, or flash memory.